Skip to content

{2023.06}[2023a,a64fx] first part of apps originally built with EB 4.9.0 #1038

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

trz42
Copy link
Collaborator

@trz42 trz42 commented Apr 22, 2025

First part of apps originally built with EB 4.9.0

@trz42 trz42 added 2023.06-software.eessi.io 2023.06 version of software.eessi.io a64fx labels Apr 22, 2025
Copy link

eessi-bot bot commented Apr 22, 2025

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphirerapids, x86_64/intel/skylake_avx512, x86_64/intel/cascadelake, x86_64/intel/icelake, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

@eessi-bot-deucalion
Copy link

Instance eessi-bot-deucalion is configured to build for:

  • architectures: aarch64/a64fx
  • repositories: eessi.io-2023.06-software

Copy link

eessi-bot bot commented Apr 22, 2025

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Apr 22, 2025

Instance eessi-bot-vsc-ugent is configured to build for:

  • architectures: x86_64/amd/zen3
  • repositories: eessi-hpc.org-2023.06-software, eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

@eessi-bot-surf
Copy link

Instance eessi-bot-surf is configured to build for:

  • architectures: x86_64/amd/zen4, x86_64/amd/zen2
  • repositories: eessi-hpc.org-2023.06-software, eessi.io-2023.06-software, eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat

@eessi-bot-toprichard
Copy link

Instance rt-Grace-jr is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2023.06-software

@trz42
Copy link
Collaborator Author

trz42 commented Apr 22, 2025

bot: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx

Copy link

eessi-bot bot commented Apr 22, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Apr 22, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Apr 22, 2025

Updates by the bot instance eessi-bot-deucalion (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Apr 22, 2025

Updates by the bot instance eessi-bot-surf (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

@eessi-bot-toprichard
Copy link

Updates by the bot instance rt-Grace-jr (click for details)
  • account trz42 has NO permission to send commands to the bot

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Apr 22, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Apr 22, 2025

New job on instance eessi-bot-deucalion for CPU micro-architecture aarch64-a64fx for repository eessi.io-2023.06-software in job dir /home/eessibot/new-bot/jobs/2025.04/pr_1038/409320

date job status comment
Apr 22 19:45:29 UTC 2025 submitted job id 409320 awaits release by job manager
Apr 22 19:46:11 UTC 2025 released job awaits launch by Slurm scheduler
Apr 22 19:47:15 UTC 2025 running job 409320 is running
Apr 23 03:41:16 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-409320.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-a64fx-1745377388.tar.gzsize: 170 MiB (179124566 bytes)
entries: 16043
modules under 2023.06/software/linux/aarch64/a64fx/modules/all
Cbc/2.10.11-foss-2023a.lua
Cgl/0.60.8-foss-2023a.lua
Clp/1.17.9-foss-2023a.lua
CoinUtils/2.11.10-GCC-12.3.0.lua
ESPResSo/4.2.1-foss-2023a.lua
GLPK/5.0-GCCcore-12.3.0.lua
GitPython/3.1.40-GCCcore-12.3.0.lua
HepMC3/3.2.6-GCC-12.3.0.lua
MPC/1.3.1-GCCcore-12.3.0.lua
MUMPS/5.6.1-foss-2023a-metis.lua
Osi/0.108.9-GCC-12.3.0.lua
PuLP/2.8.0-foss-2023a.lua
PyYAML/6.0-GCCcore-12.3.0.lua
Rivet/3.1.9-gompi-2023a-HepMC3-3.2.6.lua
YODA/1.9.9-GCC-12.3.0.lua
expecttest/0.1.5-GCCcore-12.3.0.lua
fastjet-contrib/1.053-gompi-2023a.lua
fastjet/3.4.2-gompi-2023a.lua
gmpy2/2.1.5-GCC-12.3.0.lua
libyaml/0.2.5-GCCcore-12.3.0.lua
networkx/3.1-gfbf-2023a.lua
pytest-flakefinder/1.1.0-GCCcore-12.3.0.lua
pytest-rerunfailures/12.0-GCCcore-12.3.0.lua
pytest-shard/0.1.2-GCCcore-12.3.0.lua
scikit-learn/1.3.1-gfbf-2023a.lua
siscone/3.0.6-GCCcore-12.3.0.lua
snakemake/8.4.2-foss-2023a.lua
sympy/1.12-gfbf-2023a.lua
wrapt/1.15.0-gfbf-2023a.lua
software under 2023.06/software/linux/aarch64/a64fx/software
Cbc/2.10.11-foss-2023a
Cgl/0.60.8-foss-2023a
Clp/1.17.9-foss-2023a
CoinUtils/2.11.10-GCC-12.3.0
ESPResSo/4.2.1-foss-2023a
GLPK/5.0-GCCcore-12.3.0
GitPython/3.1.40-GCCcore-12.3.0
HepMC3/3.2.6-GCC-12.3.0
MPC/1.3.1-GCCcore-12.3.0
MUMPS/5.6.1-foss-2023a-metis
Osi/0.108.9-GCC-12.3.0
PuLP/2.8.0-foss-2023a
PyYAML/6.0-GCCcore-12.3.0
Rivet/3.1.9-gompi-2023a-HepMC3-3.2.6
YODA/1.9.9-GCC-12.3.0
expecttest/0.1.5-GCCcore-12.3.0
fastjet-contrib/1.053-gompi-2023a
fastjet/3.4.2-gompi-2023a
gmpy2/2.1.5-GCC-12.3.0
libyaml/0.2.5-GCCcore-12.3.0
networkx/3.1-gfbf-2023a
pytest-flakefinder/1.1.0-GCCcore-12.3.0
pytest-rerunfailures/12.0-GCCcore-12.3.0
pytest-shard/0.1.2-GCCcore-12.3.0
scikit-learn/1.3.1-gfbf-2023a
siscone/3.0.6-GCCcore-12.3.0
snakemake/8.4.2-foss-2023a
sympy/1.12-gfbf-2023a
wrapt/1.15.0-gfbf-2023a
other under 2023.06/software/linux/aarch64/a64fx
2023.06/init/easybuild/eb_hooks.py
Apr 23 03:41:16 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] ( 1/11) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) accodring to the current ReFrame configuration, but 49152 MiB is needed
[ SKIP ] ( 2/11) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) accodring to the current ReFrame configuration, but 49152 MiB is needed
[ SKIP ] ( 3/11) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) accodring to the current ReFrame configuration, but 44236.8 MiB is needed
[ SKIP ] ( 4/11) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) accodring to the current ReFrame configuration, but 44236.8 MiB is needed
[ SKIP ] ( 5/11) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) accodring to the current ReFrame configuration, but 44236.8 MiB is needed
[ OK ] ( 6/11) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /6672deda @BotBuildTests:aarch64_a64fx+default
P: latency: 1.74 us (r:0, l:None, u:None)
[ OK ] ( 7/11) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /1b24ab8e @BotBuildTests:aarch64_a64fx+default
P: bandwidth: 8603.95 MB/s (r:0, l:None, u:None)
[ OK ] ( 8/11) EESSI_ESPRESSO_LJ_PARTICLES %module_name=ESPResSo/4.2.2-foss-2023b %scale=1_node /3370ce9a @BotBuildTests:aarch64_a64fx+default
P: perf: 0.01151 s/step (r:0, l:None, u:None)
[ OK ] ( 9/11) EESSI_ESPRESSO_LJ_PARTICLES %module_name=ESPResSo/4.2.2-foss-2023a %scale=1_node /ce9ec58a @BotBuildTests:aarch64_a64fx+default
P: perf: 0.01083 s/step (r:0, l:None, u:None)
[ OK ] (10/11) EESSI_ESPRESSO_LJ_PARTICLES %module_name=ESPResSo/4.2.1-foss-2023a %scale=1_node /a7cd00d1 @BotBuildTests:aarch64_a64fx+default
P: perf: 0.01065 s/step (r:0, l:None, u:None)
[ OK ] (11/11) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:aarch64_a64fx+default
P: perf: 580.034 timesteps/s (r:0, l:None, u:None)
[ PASSED ] Ran 6/11 test case(s) from 11 check(s) (0 failure(s), 5 skipped, 0 aborted)
Details
✅ job output file slurm-409320.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Apr 23 06:39:45 UTC 2025 uploaded transfer of eessi-2023.06-software-linux-aarch64-a64fx-1745377388.tar.gz to S3 bucket succeeded

@trz42 trz42 added ready-to-deploy Mark a PR as ready to deploy ready-to-review labels Apr 23, 2025
Copy link
Collaborator

@TopRichard TopRichard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The eb_hooks.py is included in the tarball, so it has changed, likely as a result of merging PR #1034

@TopRichard TopRichard added ready-to-review and removed ready-to-review ready-to-deploy Mark a PR as ready to deploy labels Apr 23, 2025
@trz42 trz42 added the bot:deploy Ask bot to deploy missing software installations to EESSI label Apr 23, 2025
@eessi-bot-toprichard
Copy link

Label bot:deploy has been set by user trz42, but this person does not have permission to trigger deployments

@trz42
Copy link
Collaborator Author

trz42 commented Apr 23, 2025

The eb_hooks.py is included in the tarball, so it has changed, likely as a result of merging PR #1034

Good catch. Then we should verify if the eb_hooks.py in the tarball is newer or older than the one on /cvmfs

@trz42
Copy link
Collaborator Author

trz42 commented Apr 23, 2025

The eb_hooks.py is included in the tarball, so it has changed, likely as a result of merging PR #1034

Good catch. Then we should verify if the eb_hooks.py in the tarball is newer or older than the one on /cvmfs

It's good to ingest. Actually fixes an issue that was likely created by the last ingest (probably the TensorFlow PR #1034) where the eb_hooks.py was updated/adjusted after the package was built. See diff below (between version on /cvmfs and in the tarball) illustrates the change

$ diff -u /cvmfs/software.eessi.io/versions/2023.06/init/easybuild/eb_hooks.py 2023.06/init/easybuild/eb_hooks.py
--- /cvmfs/software.eessi.io/versions/2023.06/init/easybuild/eb_hooks.py	2025-04-20 18:18:22.000000000 +0100
+++ 2023.06/init/easybuild/eb_hooks.py	2025-04-22 20:47:23.000000000 +0100
@@ -133,7 +133,8 @@
     if memory_hungry_build or memory_hungry_build_a64fx:
         parallel = self.cfg['parallel']
         if cpu_target == CPU_TARGET_A64FX and self.name in ['TensorFlow']:
-            if parallel > 1:
+            # limit parallelism to 8, builds with 12 and 16 failed on Deucalion
+            if parallel > 8:
                 self.cfg['parallel'] = 8
                 msg = "limiting parallelism to %s (was %s) for %s on %s to avoid out-of-memory failures during building/testing"
                 print_msg(msg % (self.cfg['parallel'], parallel, self.name, cpu_target), log=self.log)

@trz42
Copy link
Collaborator Author

trz42 commented Apr 23, 2025

Staging PR merged, tarball ingested ...

@trz42 trz42 merged commit 2a4c8ae into EESSI:2023.06-software.eessi.io Apr 23, 2025
59 checks passed
Copy link

eessi-bot bot commented Apr 23, 2025

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.04.23

1 similar comment
Copy link

eessi-bot bot commented Apr 23, 2025

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.04.23

@eessi-bot-deucalion
Copy link

PR merged! Moved ['/home/eessibot/new-bot/jobs/2025.04/pr_1038/409320'] to /home/eessibot/new-bot/trash-bin/EESSI/software-layer/2025.04.23

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Apr 23, 2025

PR merged! Moved [] to /scratch/gent/vo/002/gvo00211/SHARED/trash_bin/EESSI/software-layer/2025.04.23

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023.06-software.eessi.io 2023.06 version of software.eessi.io a64fx bot:deploy Ask bot to deploy missing software installations to EESSI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants